Novel algorithm for detection and identification of radioactive materials in an urban environment* 
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This study introduces a novel algorithm to detect and identify radioactive materials in urban settings using 
time-series detector response data. To address the challenges posed by varying backgrounds and to enhance the 
quality and reliability of the energy spectrum data, we devised a temporal energy window. This partitioned the 
time-series detector response data, resulting in energy spectra that emphasize the vital information pertaining to 
radioactive materials. We then extracted characteristic features of these energy spectra, relying on the formation 
mechanism and measurement principles of the gamma-ray instrument spectrum. These features encompassed 
aggregated counts, peak-to-flat ratios, and peak-to-peak ratios. This methodology not only simplified the in- 
terpretation of the energy spectra’s physical significance but also eliminated the necessity for peak searching 
and individual peak analyses. Given the requirements of imbalanced multi-classification, we created a detection 
and identification model using a weighted k-nearest neighbors (KNN) framework. This model recognized that 
energy spectra of identical radioactive materials exhibit minimal inter-class similarity. Consequently, it consid- 
erably boosted the classification accuracy of minority classes, enhancing the classifier’s overall efficacy. We 
also executed a series of comparative experiments. Established methods for radionuclide identification classi- 
fication, such as standard k-nearest neighbors (KNN), support vector machine (SVM), Bayesian network, and 
random tree, were used for comparison purposes. Our proposed algorithm realized an F1 measure of 0.9868 
on the time-series detector response data, reflecting a minimum enhancement of 0.3% in comparison to other 
techniques. The results conclusively show that our algorithm outperforms others when applied to time-series 
detector response data in urban contexts. 
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I. INTRODUCTION 


Nuclear technology and science have enriched the lives of 
millions globally, with advancements in areas such as clean 
energy, cancer treatment, food security, and pest control. 
However, it is imperative that nuclear and radioactive mate- 
rials employed in these beneficial applications remain secure 
to prevent potential misuse [1]. Data from the Incident and 
Trafficking Database (ITDB) of the International Atomic En- 
ergy Agency (IAEA) reveals that between 1993 and 2020, 
there were 3,686 reported incidents worldwide. Of these, 290 
were confirmed or suspected cases of trafficking or malicious 
use. Notably, 12 incidents involved highly enriched uranium 
(HEU), and 2 featured plutonium [2]. The detection and iden- 
tification of illegal radioactive materials in an urban environ- 
ment is crucial to ensure the safe and legal use of radioactive 
materials, prevent their illegal transfer, and protect the safety 
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of the state and its citizens [3, 4]. 


Numerous researchers delved into the detection and identi- 
fication of radioactive materials. Most studies focus on con- 
ditions, where the detector and nuclear material maintain a 
static position relative to each other. In these cases, the ra- 
dioactive source is often scaled proportionally and linearly 
superimposed onto a measured background. However, real 
measurement environments rarely exhibit a consistent back- 
ground. Thus, simulations using a constant background inten- 
sity do not adequately represent the complexities encountered 
in actual measurement contexts [5—7]. 


During routine monitoring of radioactive events, or when 
responding to specific incidents involving uncontrolled ra- 
dioactive material, imagine a detection scenario within an ur- 
ban block. Experimenters traverse this block, seeking sub- 
tle indications of radioactive materials to ascertain their pres- 
ence. Notably, in the backdrop of this urban environment, the 
most dominant element is the naturally occurring radioactive 
materials (NORM) found in various construction materials 
such as brick, granite, and concrete [8, 9]. The concentration 
of NORM varies near different buildings due to the unique 
composition of each structure and the environmental condi- 
tions surrounding it. Clearly, the background radiation within 
an urban environment fluctuates based on neighboring struc- 
tures and prevailing environmental factors [10]. Addition- 
ally, radioactive materials can sometimes exhibit low inten- 
sity, with their gamma rays being attenuated by any shielding 
or dense materials surrounding the source. The energy spec- 
tra derived from these scenarios may be further complicated 


by cumulative and peak effects [11]. Given these complex- 
ities, traditional methods often struggle to effectively detect 
illicit radioactive materials concealed within buildings or ac- 
curately determine their types. Regrettably, false positives in 
radioactive material detection can lead to grave repercussions, 
wasting valuable time and posing potential health risks to re- 
searchers and the local populace. Consequently, algorithms 
designed for detecting and identifying radioactive materials 
should be resilient against diverse background conditions and 
shielding setups [12]. 

The task of detecting and identifying illicit radioactive 
materials presents significant challenges, and various stud- 
ies have pursued techniques to address them. From a hard- 
ware equipment standpoint, R. R. Flanagan et al. [13] recom- 
mended the use of mobile, distributed sensors to detect nu- 
clear materials in transit. Their research evaluated the efficacy 
of a mobile sensor network in detecting radioactive materials 
by melding radiation transport with geographic information 
systems. 

V. Tran-Quang et al. [14] introduced an internet of radia- 
tion sensor system (IoRSS) designed for the detection of un- 
regulated radioactive materials in scrap metal recycling and 
production facilities. This system enhances the detection, lo- 
calization, and identification of radioactive materials by as- 
similating data from an array of portable radiation detectors. 
Meanwhile, J.T. Li etal. [15] pioneered the nuclide identifica- 
tion and quantitative analysis system (NIQAS) aimed at iden- 
tifying hazardous substances via MCNP simulations. Central 
to this system are a D-T neutron generator and an HPGe de- 
tector. Various modules within the system were fine-tuned 
utilizing a Signal-to-Noise Ratio (SNR) assessment method. 

Conversely, when faced with hardware constraints, the 
onus shifts to the development of effective algorithms for en- 
ergy spectrum analysis. A myriad of machine learning tech- 
niques, designed to emulate human cognition, have made sig- 
nificant strides in various domains. These include medical 
diagnosis [16], signal processing [17-19], and text classifi- 
cation [20-23]. Within the realm of radioisotope identifica- 
tion and radiation detectors, D. M. Pfund et al. [7] delved 
into defining energy region boundaries and decision metrics 
for gamma-ray spectra. Their research illuminated that se- 
lecting specific energy regions can augment the probability 
of detection in scenarios with low-count or obscured sources. 
Concurrently, C. Li et al. [24] proposed a groundbreaking ap- 
proach for radionuclide identification in urban settings, har- 
nessing a feature enhancer coupled with a one-dimensional 
neural network. Their methodology adeptly preprocesses 
the input energy spectrum data via the feature enhancer and 
seizes nonlinear information using the neural network. 

S. Wu et al. [25] devised a peak-searching technique us- 
ing a generative adversarial network (GAN) tailored for urban 
environments characterized by low count rates and brief mea- 
surements of single nuclide spectra. This GAN-centric ap- 
proach outperforms the symmetric zero-area (SZA) method 
in accurately pinpointing characteristic peaks. By signif- 
icantly reducing both the likelihood and number of false 
peaks, it bolsters the overall efficacy of peak recognition. 
Nonetheless, the quest to detect and identify illicit radioactive 


materials faces enduring challenges, including diminished de- 
tection sensitivity and the sway of environmental factors. As 
such, ongoing research is imperative to refine the precision 
and dependability of these techniques. 

This study introduces a novel algorithm for the detection 
and identification of radioactive materials within urban en- 
vironments. Our approach aims to offer a fresh solution to 
detect and identify radioactivity against the backdrop of com- 
plex urban settings, both during routine monitoring and in 
scenarios involving the uncontrolled dispersal of radioactive 
substances. Initially, the time-series detector response data, 
collected from an urban setting, were segmented using a tem- 
poral energy window. We then extracted distinct features 
from the energy spectra, drawing on the formation mecha- 
nism and measurement principle inherent to gamma-ray in- 
strument spectra. These key features encompass aggregated 
counts, peak-to-flat ratios, and peak-to-peak ratios. Given the 
need for imbalanced multi-classification, we crafted a detec- 
tion and identification model grounded in the weighted KNN 
architecture. 


Il. METHOD 


The proposed method unfolds in three pivotal steps: (a) To 
contend with the variability of backgrounds and accentuate 
the primary information from the radiation source, the time- 
series detector response data was segmented using a tempo- 
ral energy window. (b) For a comprehensive analysis and to 
elucidate the physical implications of an energy spectrum, 
distinct features were drawn from the energy spectra. This 
extraction leaned on the formation mechanism and measure- 
ment principle of the gamma-ray instrument spectrum, incor- 
porating features such as aggregated counts, peak-to-flat ra- 
tios, and peak-to-peak ratios. (c) With the aim of enhanc- 
ing the resilience and precision of the model for detection 
and identification tasks within urban settings, we fashioned 
a model rooted in the weighted KNN architecture. The se- 
quence of our proposed algorithm is illustrated in Fig.1 


A. Temporal energy window 


In this subsection, the temporal energy window is proposed 
for sample processing of time-series detector response data. 
Samples were partitioned into multiple segments with con- 
sideration of the sample type. 

Urban landscapes teem with roads and structures com- 
posed of natural and man-made substances. Natural Occur- 
ring Radioactive Materials (NORMs) are inherent in these 
substances, with concentrations differing across materials. 
Predominantly, NORM comprises isotopes, such as “°K, 
2380, and ?3?Th, along with the radioactive daughter prod- 
ucts of the latter two, commonly denoted as KUT [26]. As 
detectors navigate the search zone, particularly when radioac- 
tive substances are unmonitored, the makeup of the neighbor- 
ing structures and their ensuing radioactive signatures shift 
with each locale [27]. Consequently, the cumulative gamma 
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Fig. 1. (Color online) Block diagram of the proposed method. Partitioning the time-series detector response data using temporal energy 
windows, and converting the resulting corresponding segments into spectral form. Extracting features, such as aggregated counts, peak-to- 
flat ratios, and peak-to-peak ratios, based on the formation mechanism and measurement principle of the gamma-ray instrument spectrum. 
Constructing a weighted KNN-based detection and identification model for the imbalanced multi-classification problem in urban environment 


radiation detection. 


photon count rate and spectra recorded by detectors might 
demonstrate notable fluctuations [28]. Adding to the com- 
plexity, illicit radioactive substances may be concealed, lead- 
ing to attenuated detection signals that are challenging to 
identify. The interplay between gamma photons and diverse 
substances, mediated by various physical processes, ampli- 
fies the dynamism of the observed radiation background sig- 
nal. Hence, the time-series detector response data acquired in 
urban settings are profoundly shaped by ambient conditions, 
often overshadowing the distinctive peaks that mark the pres- 
ence of radioactive materials in the energy spectra. 


To counteract the effects of variable backgrounds, enhance 
the integrity and dependability of the energy spectrum data, 
and streamline subsequent data processing, we segmented the 
time-series detector response data using a temporal energy 
window. This strategy primarily underscores the features of 
faint radioactive materials. 


In an urban setting, the detection system operates under 
two potential conditions: with or without the presence of an 
auxiliary radiation source, which is contextualized against the 
background radiation. Consequently, the time-series detector 
response dataset encompasses active and passive samples. An 
active sample pertains to the detector response data captured 
in the presence of a radioactive source, while a passive sam- 
ple relates to data collected in an environment devoid of any 
radioactive source. 


The time-series detector response dataset is defined as T. 


T = {(S$1, y1, loc1), (S2, y2,locz),--- , (Sm, ym, loca) } 


S; is the matrix of time-series detector response data and de- 
fined by Eq.1. Furthermore, M denotes the number of sam- 
ples in the dataset. 2 = 1,2,---,M. 
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tê denotes the time series, t € QF, and e’ denotes the energy 


value recorded by the detector over time, and ei E€ Qt. m 
denotes the length of S;, with j = 1,2,--- ,m. 

When S; is an active sample, y; € J is the class label of 
S;. Furthermore, loc; € Q* denotes the time point when the 
detector is closest to the radioactive source position during the 
movement, ti < loci < ts When S; is a passive sample, 
yi = 0 and loc; = 0. 

The temporal energy window is proposed for sample pro- 
cessing of time-series detector response data. The quantity 
and length of a temporal energy window were defined as 
wq € Z* and w, € Zt. The key of utilizing a temporal 
energy window for processing time-series detector response 
data lies in determining the temporal origin of the window, 
which refers to the initial point from which the temporal en- 
ergy window conducts the partition task, thereby determining 
the position of the window within the time series. The tem- 
poral origin of an energy window is defined as: tj and j’ is 
calculated by Eq. 2. 


; m m 1 
j’ = {[0, wq — 1] x | |+ E [p <B> {argmin [loci — ty Als 


In Eq. 2, b denotes a Boolean variable. Assuming that the 
present sample is active, b is true; whereas if the present sam- 
ple is passive, then b is false. Obviously, t; of a passive sam- 
ple is distributed evenly at several different locations in the 
time axis, while ¢;, of an active sample is fixed due to the de- 
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In Eq.3, S$ seg denotes the group of segmented time-series 
detector response data of S; and represented by Eq.4, seg = 


[segi, 5€g92; +>, ii 
S z = {[t seg? Creal} (4) 
Employing S seg, t0 Symbolize each individual segmented 


sample of S eg: Sieg, is denoted by Eq.5. Obviously, the 


length of each S$, gą 18 Correlated with the length of the tem- 
poral energy window wy. 
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B. Peak-ratio spectrum analysis 


In this subsection, we delve into the formation mecha- 
nism and measurement principle of the gamma-ray instru- 
ment spectrum. These are leveraged as the foundation for 
extracting spectral features. Key features include aggregated 
counts, peak-to-flat ratios, and peak-to-peak ratios. This type 
of an approach aids in the analysis and interpretation of the 
intrinsic significance of energy spectra. 

After processing through the temporal energy window, the 
segmented time-series detector response data is converted 
into an energy spectrum format, easing the subsequent feature 
extraction. The energy spectrum provides a distribution curve 
mapping the count rate against particle energy, a pivotal tool 
in detecting and identifying radioactive nuclear materials. 

For the context of this study, the relative distance between 
the detector and radiation source is in constant flux due to 
the detector’s movement. It is essential to underline that 
this study primarily focuses on scenarios with static radia- 
tion sources. Dynamics, such as the continuous movement 
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mand for obtaining energy fragments as close to the source as 
possible. 

The segmented time-series detector response dataset pro- 
cessed by a temporal energy window is denoted as T'seg and 
represented by Eq. 3. Samples in Tseg may have been ex- 
panded in comparison to T, which is dependent on wy. 
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of the source or its dissolution in water, have not been con- 
templated. Owing to the finite number of photon counts 
within the full energy peak, statistical fluctuations become 
pronounced. Consequently, the channel with the peak counts 
might not align with the expected value of a Gaussian dis- 
tribution [29, 30]. To mitigate the effects of these statisti- 
cal fluctuations, spectral data are reorganized into multiple 
bins along the energy axis. Each bin encompasses an energy 
range, and counts within this range are consolidated to create 
a novel feature vector. 
The transformed spectrum dataset is denoted as T'spe- 
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Eseg P the transformed spectrum data from S! 
i = 1,2,---,M and seg = {segi, $€92,*** ,8€Gu,}- Yi E 

J is the class label of ae During the process of transforma- 
tion, information of loc and t are discarded. The transformed 


spectrum data £$ is in the form of vector and represented 
by Eq.6. 


a 


g 


i 


Tseg — ees, ) yı), (E epz yı), IT (Legu ’ yı)} (6) 


ogi represents the transformed energy spectrum of the k- 
th segment of the 7-th sample of the time-series detector re- 
sponse dataset. 
a m E a ea (7) 
i = 1,2,---,M and k = 1,2,--- , wg. oli) is the aggre- 
gated count tof the n-th bin of legg 
The length of each Tieg i.e. n, is the same because the 
same energy range is precomputed before the transformation 
process and a fixed number of bins is selected uniformly. 
The value of n is determined based on the Expected value 
of the maximum and minimum energy values across all 
samples in T. The expected value of the maximum and 
minimum energy values are denoted as [E[max(e*)]] and 


|E[min(e*)]], i = 1,2,---,M. n € Z* and the value of 
n is calculated by Eq.8. 


Furthermore, xË rk) 
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indicates the photon count in e in 


Segr 
the corresponding energy interval, and zË $) is calculated by 


Eq. 9. Specifically, countif( A, B) is a function that searches 
the range A for items that match condition B and counts 
them. Additionally, œ is applied to represent the energy range 


n = [E[max(e’)]] — |E[min(e‘)]| (8) to fine-tune the transform accuracy of the energy spectrum. 
| 
hbk) = countif(e%.,, > |E[min(e*)]| +(n—-1)xa< ae < |E[min(e’)]| +n x a) (9) 


Detection and identification of radioactive materials pri- 
marily hinge on nuclear radiation detectors, which capture 
gamma rays emitted during the decay process. The mea- 
surement of gamma ray energy is determined by registering 
the energy dispersed within the detector. The main mecha- 
nisms driving gamma energy spectrum measurements encom- 
pass three interactions between gamma rays and the detector 
medium: the photoelectric effect, the Compton effect, and 
pair production. 


Low-energy gamma rays (0 — hundreds of keV) predom- 
inantly undergo the photoelectric effect, resulting in at least 
one distinct photoelectric peak. Medium-energy gamma rays 
(hundreds of keV — 3 MeV) primarily interact through the 
Compton effect. Conversely, high-energy gamma rays (5- 
10 MeV and beyond) are primarily subject to pair produc- 
tion. The photoelectric peak, when the energy of the inci- 
dent gamma radiation is below 1.02 MeV, is often termed as 
the full energy peak. This peak is traditionally considered as 
the primary hallmark for identifying specific radioactive nu- 
clides. The full energy peak arises from the sum of the photo- 
electric peak’s energy combined with energy from Compton 
electrons and photoelectrons stemming from Compton scat- 
tering interactions. In the spectrum of low-to-medium energy 
gamma rays, pair production is negligible. Instead, the energy 
spectrum is characterized by a Compton continuum and pho- 
toelectric peaks. When gamma rays possess intermediate en- 
ergy, incident gamma photons experience multiple successive 
Compton scatterings. The energy from the recoil electrons, 
produced from these scatterings, is deposited in the detector. 
Notably, the cumulative energy of these recoil electrons can 
surpass the energy transfer’s upper limit in a single scattering 
event, filling regions between the Compton edge and photo- 
electric peaks [9]. 


From the prior discussion on the formation mechanism and 
measurement principle of the gamma-ray instrument spec- 
trum, it is clear that gamma energy spectra contain both 
photoelectric peaks and the Compton continuum. Conven- 
tionally, the photoelectric peaks serve as the primary identi- 
fiers for radionuclides. Conversely, the Compton continuum, 
which often exhibits similar shapes across different contexts, 
is usually overlooked. However, relying solely on characteris- 
tic peaks for radioactive nuclide identification may fall short 
in complex background situations [31-33]. Drawing inspi- 
ration from Ref. [7], this subsection introduces peak-to-flat 
ratios and peak-to-peak ratios as descriptors for the spectral 


features. 

Equation 7 defines the form of energy spectrum after bin- 
ning. Here, a =) indicates the photon counts in the corre- 
sponding energy bins and can be calculated by Eq.9. Based on 
this, specific bins are selected according to the decay proper- 
ties of the radionuclide material, which correspond to the area 
of theoretical Compton continuum, characteristic peaks, and 
auxiliary peaks, respectively. For x’, gx? the area of the the- 
oretical Compton continuum, characteristic peaks, and auxil- 
iary peaks are represented as ac, af, and au, respectively, and 
defined as Eq.10— Eq.12, respectively. 2 = 1,2,--- , M and 
k=1,2,--+ , Wq 
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The boundaries of the Compton continuum are denoted by 
c, and c,, while the characteristic peaks are bounded by fı and 
f: and the auxiliary peaks are bounded by aq; and ap. These 
boundaries are selected based on the decay properties of the 
radionuclide material. The peak-to-flat ratio rı and peak-to- 
peak ratio ro are defined as Eq.13. 
r= as/ac, 72 = dy/as (13) 
Based on a macroscopic perspective, rı characterizes the 
capability to discern low-energy weak peaks amidst a com- 
plex background, while rz measures the likelihood of gamma 
rays experiencing multiple interactions within the detector, 
culminating in their contribution to the full-energy peaks. 


C. Classification 


In this subsection, considering the requirements for im- 
balanced multi-classification, we developed a detection and 
identification model using the weighted KNN architecture. 


By capitalizing on the inherent trait that energy spectra from 
identical radioactive materials exhibit minimal inter-class 
variability, the model significantly boosts the classification 
accuracy for underrepresented classes and improves the over- 
all efficacy of the classifier. 

Following the peak-ratio spectrum analysis, we derive a set 
of feature vectors comprised of aggregated counts, peak-to- 
flat ratios, and peak-to-peak ratios, denoted as Tapp- 


Tapp = ae yı), (Gegi y2), Aee (cla ym) 


Ceg represents the set of feature vectors from Bregs which 
is denoted by Eq.14. i = 1,2,---,M and seg = 
{segi, S€g2,°** , 8€gw,}- Yi € Y is the class label. 


Coeg = ACen ) Yi), (Cigo ) yi), cane (Csegu , yi) } (14) 
Here, Cieni represents the feature vector of the k-th segment 
of the i-th sample in the original time-series detector response 
dataset. 
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i = 1,2,---,Mandk = 1,2,---, wo. Zie ro), and 
rik) denote the aggregated counts, peak-to-flat ratio, and 
peak-to-peak ratio of the k-th segment of the i-th sample 
in the original time-series detector response dataset, respec- 
tively. 

The sample to be classified is represented as So, with its 
corresponding feature vector symbolized as co. Co and Ceg, 
are n-dimensional vectors, i.e., Co € IR” and A ER”. 

The function f evaluated at the sample point c{.4, 18 Yi» 
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i.e., yi = f (etn). The vector of observations is defined as: 


YM 


To construct a surrogate model f of a function f, sample 


points Ci egi are acquired into the matrix. The dimension of 


the matrix C is M x wq X n. 
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Notation c\*) signifies Cyeg,: Here, K denotes the num- 
ber of nearest neighboring sample points. Z denotes the set 
of K sample points that are closest to cp in terms of distance, 
which is denoted by Eq.16. Z C C and |Z| = K. 


Z= Ti € C : rank (a (co, lth) < K} (16) 


Function rank(d (co, c@*))) represents the ranking of the 
distance d (co, ck) in ascending order. 


The surrogate model f of function f is defined by Eq.17. 
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w(-) is the inverse distance weight function, which is defined 
by Eq.18. 


(17) 
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Where q denotes a normalization power and d (co, ck) de- 
notes the distance between the target point cg and sample 
point Cseg,: The metric used to measure this distance is Lp- 


norm, which is defined by Eq.19. 
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(19) 
Here, 8 denotes the index of the dimension of the vector. 
c®™ denotes the -th dimension of Cases and co,g de- 
notes the $-th dimension of cg. Specifically, Lı -norm (where 
p = 1) represents the rectangular distance, while L2-norm 
(where p = 2) represents the Euclidean norm. Furthermore, 
Loo-norm (where p — oo) represents the maximum norm. 
To limit the effect of farther samples points and also avoid 
divisions by zero, the distance function is implemented as fol- 
lows. If [co — clk) ll, = 0, then distance d (co, ck) is set 


to. If0 < ||co — chk) ll, < R, then distance d (co, e*)) 
is calculated by Eq.19. If [co — cli) ll, > R, then distance 


d (co, chk) = 0. Where £ is a small number, and R is the 
radius of the distance function d(-). Table 1 summarizes the 
overall flow of the algorithm. 


Il. EXPERIMENTS AND ANALYSIS 


This section detailed the processes of data acquisition and 
preprocessing and established a series of comparative experi- 
ments to validate the efficacy of the proposed algorithm. All 
experiments conducted in this section utilized ten-fold cross- 
validation to guarantee the reliability of the results. 


A. Introduction of data source 


The experimental data utilized in this study originated from 
a time-series detector response dataset, representing a Nal(T1) 
detector’s movement within a simulated city block using the 
Monte Carlo method. This dataset was curated by J. M. 
Ghawaly Jr and his team at Oak Ridge National Laboratory 
(ORNL) [34]. Fig.2 offers a visual representation capturing 
the core features of the dataset. The model simulated seven 
interconnected city blocks in three dimensions, encompass- 
ing various buildings, sidewalks, roads, parking areas, and 
other urban elements. The Naturally Occurring Radioactive 


Table 1. Overall flow of the proposed method 


Algorithm: Algorithm for detection and identification of 
radioactive materials in an urban environment 


Input: 

Si: Time-series detector response data; 

yi: Class label; 

loci: Time point (detector closest to the source); 

wq: Quantity of the temporal energy window; 

w: Length of the temporal energy window; 

a: Energy range of bins; 

b: Sample state, true for active, false for passive; 

So: Sample to be classified; 

K: Number of nearest neighbors; 

Begin: 
1. Compute the length of Sj: 
Mz=length(S;) 
2. Search the temporal origin of the energy window: 
J’ = fp x pt fa x p 
Here, fp = {[0, w4 — 1] x fA ' EA }, 
fa = {argmin |loc; — ty| — |5 x wil} 
j€[1,m] 
3. Preprocess the time-series detector response data 
to obtain segmented pulses: 
Soeg — ere Creg| 
4. Calculate the count value in bins: 
ah) = counti f (ebeg, , Cond) 

Cond = E + (n—-1)-a S ebeg, <E +n- Q, in 
which, n = [E[maz(e’)]] — |E[min(e‘)]], 
and E = |E[min(e*)]|; 

5. Segmented energy spectrum is obtained: 
i _ fmk) (ik) Gk) 
Beg, ~ {2I a23 oop Y 
6. Calculate boundary values of Compton continuum, 


characteristic peaks and auxiliary peaks: 
i,k 
ac = S iele er] (of ) 
i,k 
as = Emwelt tA n>) 
i,k 
au = er sur] oF ») 
7. Obtain peak-to-flat ratio and peak-to-peak ratio: 
rı =af/Qc, T2 = Gu/ar 
8. Obtain feature vector: 


Cycgn = [Eseg $ rf A rË | 
9. Calculate the distance between co and sample 


points: 
d (co, elt? = [eo — eo) 
10. Search K sample points that are closest to ci in 
terms of distance: 
Z= am € C : rank (a (cae) ) < K} 
11. Calculate the weight of K sample points: 


(i,k) = 1 
w (co, oe [a(eg,e@-*))]4 


End 
Output: 
Predict the class label of So: 
a _ Leaweo wleoe™ )ys 
Feo) Uelibeo w(co,e(*)) 


Materials (NORM) incorporated included 40K, ?32Th and its 
progeny, as well as 7°°U/?3°U and their respective offspring. 


The concentration of each component within the KUT (Potas- 
sium, Uranium, and Thorium) might vary depending on the 
specific material. Each block’s background radiation was in- 
dividually computed. The radioactive materials were poten- 
tially concealed in 15 distinct spots. Each radioactive source 
could exist in one of two states: either unshielded or shielded 
by 1 cm of lead. A Nal(Tl) detector navigated through these 
city blocks without the interference of cars or other forms of 
clutter. 


| Buildings, parking lots, ete. [7] Sidewalks, roads 


į [Modular city blocks 


@ Radioactive source * Radiation influence area 6% Vehicle with detector 
Fig. 2. (Color online) Schematic diagram of the fundamental char- 
acteristics of the dataset. This model consisted of seven modular 
city blocks, and the order of the blocks can be adjusted. Size of the 
model was 989-1047 m x 201 m x 158 m. For each component 
of the blocks, every NORM isotope in each material (asphalt, brick, 
granite, concrete, and soil) in its composition was modeled. These 
data form the background of the urban environment. A 2” x 4” x 16” 
Nal(T1) detector traversed the city block in the absence of cars or 
other forms of clutter. The velocity of the detector was a value in the 
range of 1—-13.4 m/s and remains constant. The walls of the buildings 
in the model were 6 in (15.24 cm) thick [34]. 


The dataset comprises radioactive materials from two cat- 
egories: special nuclear materials (SNMs) and common 
sources. The SNMs are represented by highly enriched ura- 
nium (HEU) and weapons-grade plutonium (WGPu), while 
the common sources are Technetium-99m (°°”'Tc), Iodine- 
131 (131D), and Cobalt-60 (©°Co). Both HEU and WGPu are 
characterized by energy spectra dominated by prompt fission 
neutrons and prompt gamma rays, which are emitted during 
fission. These gamma rays possess a broad energy range, 
spanning from several hundred keV up to multiple MeV. 
Conversely, 99™ Tc releases gamma rays that predominantly 
linger around the 140-keV energy mark. 131I emits mainly 
beta particles accompanied by gamma rays; the beta particles 
peak at energies near 606 keV. The emitted gamma rays have 
varying energy levels, with the most notable peaks observed 
at 364 keV and 637 keV. Finally, °°Co radiates gamma rays 
that prominently feature two energy peaks, one at 1.17 MeV 
and the other at 1.33 MeV [9]. 

Specifically, 9700 samples were labeled and listed in Ta- 
ble 2, of which 4900 were background samples without any 
radioactive materials, while the remaining 4800 samples con- 
tained radioactive materials. 


B. Comparative experiments 


In this subsection, to optimize and assess the model’s per- 
formance while ensuring its practical applicability, the time- 
series detector response dataset was partitioned into three 


Table 2. Radionuclide library 


Label Radioactive Materials Capacity 
0 Background 4900 
1 HEU 800 
2 WGPu 800 
3 al 800 
4 Co 800 
5 ale 800 
6 HEU=+”°™ Tc 800 


distinct subsets: training (60%), validation (20%), and test- 
ing (20%). This division was subjected to ten-fold cross- 
validation. A stratified random split was adopted, guarantee- 
ing a balanced representation of radioactive materials across 
all subsets. The model was implemented in Python, leverag- 
ing the capabilities of the PyTorch framework. For compar- 
ative analysis, the Weka machine learning toolkit was em- 
ployed. All experiments were executed on a system fur- 
nished with an Intel Core i7 processor, 16GB of RAM, and 
an NVIDIA GeForce RTX 3070 graphics card. 

To streamline the discussion, the proposed algorithm in the 
experimental results will be referred to as TPW. In the exper- 
iments detailed in this subsection, the values of K, p, and q 
were set to 5, 2, and 2, respectively. A comprehensive exami- 
nation and discussion regarding the selection of these param- 
eter values can be found in Sect. II E. The testing accuracy 
achieved was 99.1%, with an F1 score of 0.9868. The confu- 
sion matrix derived from the test data can be viewed in Fig. 3. 


0-Background 0.0% | 0.0% | 0.1% | 0.0% | 0.0% | 0.0% 
1-HEU| 0.1% | 6.7% 0.0% 
2-WGPu| 0.1% 6.7% 

3 

2 

a 3-1131| 0.1% 6.8% 

E 

& 
4-Co60| 0.1% 6.7% 
5-Tc99m| 0.1% 6.7% | 0.0% 

6-HEU+Tc99m| 0.0% | 0.1% 0.1% | 6.6% 


99.1% 98.7% 


1.2% 


0.9% 1.3% | 0.2% 1.2% | 0.2% | 2.1% 
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Fig. 3. Confusion matrix for the proposed algorithm’s test results. 
Each cell within the matrix’s core is normalized according to the to- 
tal observations of the respective class, illustrating the proportion of 
correctly identified samples within the whole dataset. The column 
summary indicates the percentage of correct and incorrect classifi- 
cations for each predicted class, scaled by the overall observations 
of that predicted class. Similarly, the row summary portrays the per- 
centage of correct and incorrect classifications for each actual class, 
adjusted by the total observations of that specific class. 


In summary, the TPW algorithm has shown promising re- 


sults in both passive backgrounds and active scenarios. Anal- 
ysis of the column and row summaries indicates that the 
most notable misclassifications occur between class 1(HEU), 
5(°™Tc), and 6(HEU+""" Tc). The primary reason for this 
is the presence of radioactive material in the detection scene 
of class 6, which is also present in classes 1 and 5, causing 
ambiguity in the identification process. 

Furthermore, to provide a comprehensive comparison, the 
standard KNN (KNN) [16, 35], support vector machine 
(SVM) [17, 36], Bayesian network (BayesNet) [18, 37], ran- 
dom tree (RandomTree) [19, 38], and the proposed algorithm 
(TPW) were applied for evaluation. The aforementioned 
methods were commonly utilized for radionuclide identifica- 
tion classification in recent years. Comparative experiments 
were conducted using the Weka machine learning toolkit [39] 
with a batch size of 100 and ten-fold cross-validation. The 
main parameters used for each method were as follows: For 
standard KNN, the number of neighbors was set to 5 with 
no distance weighting. For SVM, the Poly Kernel was used 
as the kernel function. The complexity and tolerance param- 
eters were set to 1.0 and 0.001, respectively. For Bayesian 
network, the initial network structure used for learning was 
the Naive Bayes Network and the maximum number of par- 
ents that a node in the Bayes net can have was limited to 1. 
For Random Tree, the random number seed used for selecting 
attributes was 1, and the minimum total weight of instances 
in a leaf was set to 1.0. The maximum depth of the tree was 
unlimited. 

Given the imbalanced nature of the dataset, relying solely 
on traditional classification accuracy can be misleading. For 
instance, a model might achieve high accuracy simply by 
categorizing all samples as the majority class, in this case, 
*Background’. Hence, a variety of evaluation metrics were 
utilized in this subsection to provide a holistic view of model 
performance. Fig.4 presents these metrics for different mod- 
els. Comparing the TPW algorithm with four other methods 
(Standard KNN, Support Vector Machine, Bayesian Network, 
and Random Tree) across five distinct evaluation metrics, it 
became evident that TPW excels. Specifically, TPW con- 
sistently showcased superior accuracy, F1 score, MCC, ROC 
Area, and PRC Area when compared to its counterparts. This 
underlines TPW’s enhanced efficacy and reliability in tasks 
related to radionuclide identification. 

We further performed individual tests for each class of 
samples, contrasting the TPW algorithm’s performance with 
the four other methods using the Fl measure. Fig.5 show- 
cases the classification results across different models for ev- 
ery sample class. Overall, the TPW algorithm emerged as the 
top performer amongst all the tested methods. In particular, it 
exhibited a commendable capacity to accurately classify sam- 
ples from every class, underscoring its robustness and adapt- 
ability to various sample types. 

Examining the results in depth, we note that the efficacy 
of different methods varies considerably across classes. In 
particular, for some classes, the TPW algorithm notably sur- 
passes the F1 measures of its competitors, while for others, 
the performance differences are more nuanced. This indicates 
that the TPW algorithm is especially adept at processing cer- 
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Fig. 4. Multiple evaluation metrics across various models. The x- 
axis represents the evaluation metrics including accuracy (Acc), F1 
measure (F-Measure), Matthews correlation coefficient (MCC), re- 
ceiver operating characteristic area (ROC Area), and precision-recall 
curve area (PRC Area). The y-axis shows the performance values 
for each method under the corresponding metrics for every class of 
samples. 
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Fig. 5. F1 measure for each class of samples across various models. 
The F1 measure values of different methods for each class of sam- 
ples are plotted on the y-axis of the plot, while the x-axis of the plot 
indicates the corresponding class of samples as listed in Table 2. 


tain sample types, although its relative advantage might be 
less distinct for other sample types. Notably, the F1 mea- 
sure for the samples of class 0 (background), 3 (1311), and 4 
(©°Co) is higher, whereas it is somewhat subdued for class 1 
(HEU), 2 (WGPu), 5 (°° Tc), and 6 (99 Tc). By analyzing 
the peak energies of these radioactive materials, we discerned 
that their characteristic peaks are all below 200 keV. This sug- 
gests that their accurate detection and identification might be 
compromised by the Compton continuum. Nevertheless, the 
TPW algorithm excels over other models, surpassing them by 
at least 0.18% in multi-isotope scenarios like HEU+99 Tc, 
and showcases lower variability than other models when de- 
tecting radioactive materials. 


C. Discussion on temporal energy window 


In this subsection, we explored the effects of varying pa- 
rameters associated with the temporal energy window, specif- 


ically focusing on the number and duration of these windows. 
The movement of the detector poses challenges in determin- 
ing the ideal length for an energy window. A brief window, 
such as 1 s, might not capture sufficient relevant data, while 
an extended window, such as 20 s, could introduce substan- 
tial noise, potentially overshadowing crucial signals. Fig.6 
depicts the energy spectra for a sample from class 4 (°°Co) at 
varying energy window durations. As the window length in- 
creased, there was a noticeable rise in the count of the energy 
spectrum. However, the count values across different energy 
points did not grow linearly with the expansion of the win- 
dow length, possibly due to statistical fluctuations and other 
influencing factors [29, 30]. For a deeper insight into the nu- 
ances of the spectral lines, Fig.7 highlights the variations in 
the morphology and attributes of the energy spectra, as the 
window length transitions across five distinct durations. 
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Fig. 6. (Color online) Energy spectra with respect to different tem- 
poral energy window lengths. This figure illustrates the energy spec- 
tra of ° Co for different temporal energy window lengths of 1, 3, 5, 
10 and 20 s. The energy range shown is between 0-3000 keV. The 
count value displayed on the Z-axis shows an increasing trend with 
the varying window lengths. 


Figure 8 illustrates the model’s performance changes in re- 
lation to varying window lengths and quantities. The data 
indicates that as the number of temporal energy windows in- 
creases, the model’s prediction accuracy also rises, particu- 
larly when the window quantity equals 5. With the increase 
in the number of temporal energy windows, the proposed 
algorithm captures a richer representation of the signal, en- 
hancing the differentiation between signal and noise, and thus 
providing a more precise target prediction. Additionally, the 
model’s prediction accuracy trends upward with longer win- 
dow durations. However, there is a significant decline in ac- 
curacy with excessively long windows. These experiments 
shed light on the influence of window lengths and quantities 
on classification efficacy, emphasizing the need for optimal 
temporal scale selection and feature extraction methods for 
accurate classification in the given task. 


10 


100 100; 100 | 100 

80 80} 80 | | 80 80 
260 260} | 260} | i 260 i 260 
Š mhmh wis we L situ |} & dba io | MIE 

40 A 40 E 40; A “| Pq 40 

E A g \ i 

i | aA d A mad a a l 
0 1000 2000 3000 0 1000 2000 3000 0 1000 2000 3000 0 1000 2000 3000 0 1000 2000 3000 
Energy/keV Energy/keV Energy/keV Energy/keV Energy/keV 
(a) ls (b) 3s (c) 5s (d) 10s (e) 20s 


Fig. 7. (Color online) Detailed energy spectra across varied window lengths. These five diagrams offer an in-depth examination of the energy 
spectra of "Co, captured at different temporal energy window durations, complementing Fig. 6. The inset images in the upper right magnify 
the spectrum within the highlighted red boxes. The irregular growth in count values at various energy addresses, as the window length 
expands, can be linked to statistical fluctuations and other potential influences. 
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lengths and quantities. The plot displays the F1 measure values for 
various parameter combinations, with x-axis representing the energy 
window lengths. 


D. Discussion on peak-ratio spectrum analysis 


In this subsection, we explored the influence of individ- 
ual and combined features on the classification performance 
using ablation experiments. Table 3 contrasts the classifi- 
cation outcomes of the aggregated energy spectrum counts, 
peak-flat ratio, and peak-peak ratio features against those us- 
ing the joint features, shedding light on their individual and 
combined impacts. 


Table 3. Results of ablation experiments 


No. Feature length Test F1 Measure 
1 Aggregated counts 99 0.9843 
2 Peak-flat ratio 22 0.9245 
3 Peak-peak ratio 30 0.8435 
4 Joint feature 151 0.9868 


The F1 measure comparisons reveal that combined fea- 
tures outperform their individual counterparts. This supe- 
rior performance of joint features arises from their capac- 
ity to seamlessly assimilate energy spectrum data from di- 
verse viewpoints. By harmoniously harnessing the unique 
attributes of each feature, joint features amplify classifica- 


tion precision, overshadowing the results achieved by singu- 
lar features. Additionally, joint features adeptly counteract 
challenges intrinsic to individual features, such as noise in- 
terference, data sparsity, or lack of comprehensive represen- 
tation. Conversely, singular features often struggle to offer 
a holistic and resilient information foundation for classifica- 
tion [23]. Therefore, combined features furnish a more holis- 
tic and richer data representation, bolstering classification ef- 
ficiency. In essence, the empirical findings underscore the 
merit of deploying joint features for more accurate radionu- 
clide identification. 


E. Discussion on classification model 


In this subsection, we examined the proposed algorithm 
by assessing the impact of three factors: the distance metric, 
value of K, and distance weight. We experimented with vari- 
ous distance metrics, including Euclidean, City block, Cheby- 
shev, Correlation, Spearman, Hamming, Jaccard, and Cosine. 
The value of K was varied between 1 and 20. For distance 
weighting, we considered three approaches: "Equal distance" 
(ED), which did not incorporate any weight; "Inverse dis- 
tance" (ID), where the weight was based on the inverse of 
the distance to the data point; and "Inverse distance squared" 
(IDS), where the weight was determined by the inverse of the 
squared distance to the data point. The experimental results 
are listed in Table 4. During the experiments, K was set to 10 
and all the samples were subjected to standardization. 


Table 4. Results of discussion on classification model 


No. Distance metric Distance weight Veri Fl Test Fl 
1 Euclidean IDS 0.9672 0.9763 
2 City block IDS 0.9727 0.9814 
3 Chebyshev IDS 0.8761 0.8872 
4 Correlation IDS 0.9749 0.9762 
5 Spearman IDS 0.9639 0.9644 
6 Hamming IDS 0.3575 0.3811 
7 Jaccard IDS 0.3575 0.3811 
8 Cosine IDS 0.9707 0.9711 
9 Euclidean ID 0.9505 0.9595 
10 Euclidean ED 0.9238 0.9333 


Based on the observed F1 measure during validation and 
testing, it is evident that the Euclidean metric outperformed 
other distance metrics in classification accuracy. In terms 
of distance weighting, the IDS emerged as the superior per- 
former. By significantly reducing the influence of far-off 
points on the classification decision, IDS led to more precise 
and dependable results. Collectively, the experimental data 
indicated that a combination of the Euclidean distance metric 
with the IDS weighting scheme stands as an optimal choice 
for the proposed algorithm when tackling classification tasks. 

Selecting an appropriate value of K was crucial for op- 
timal model performance. A very small K makes the model 
vulnerable to noise in the feature points, which can greatly in- 
fluence classification outcomes. Conversely, an overly large 
K dilutes the specificity of the model as the neighborhood 
around the training instance becomes too expansive, increas- 
ing the likelihood of misclassifications [16, 35]. Thus, strik- 
ing a balance between noise resistance and model precision 
by carefully adjusting the K value is imperative. We under- 
took a series of tests to discern the effect of different K val- 
ues, ranging from | to 20, on the efficacy of the proposed 
algorithm. The outcomes of these tests are depicted in Fig. 9. 
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figure showcases the F1 measure for the proposed algorithm for K 
values spanning from 1 to 20, plotted on the X-axis. The Y-axis 
indicates the Fl measure. Two distinct curves denote the outcomes 
from verification and test experiments, respectively. 


From the results, it can be observed that the F1 score ini- 
tially increases as the value of K increases from 1 to 5, reach- 
ing a peak value of 0.9868 at K=5. Then, F1 score slightly 
fluctuates and then starts to decline as K further increases. 
Additionally, the results suggest that the model has a high 
overall performance, with F1 scores consistently above 0.95 
for all values of K. This indicates that the model is effective 
in accurately predicting the class labels of the input data. This 
pattern of results suggests that increasing the value of K can 
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lead to better classification performance up to a certain point, 
beyond which overfitting may occur, resulting in a decline in 


performance. 
IV. CONCLUSION 


This study introduces a novel approach for detecting and 
identifying radioactive materials within urban settings. 
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when viewed as a time series, are effectively segmented us- 
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streamlines the downstream data processing. This segmenta- 
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the measurement principles of gamma-ray instruments, yield- 
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The features we extract, including aggregated counts, peak- 
to-flat ratios, and peak-to-peak ratios, offer a comprehen- 
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However, this technology does come with certain limita- 
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fected by improvised nuclear devices (INDs) or tactical nu- 
clear artifacts. The detectors are vulnerable to electromag- 
netic pulses (EMPs) and prompt gamma rays, which span a 
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prowess. Moreover, this technique holds promise for broader 
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